Development and Transcription of Assamese Speech Corpus

نویسندگان

  • Himangshu Sarma
  • Navanath Saharia
  • Utpal Sharma
  • Smriti Kumar Sinha
  • Mancha Jyoti Malakar
چکیده

A balanced speech corpus is the basic need for any speech processing task. In this report we describe our effort on development of Assamese speech corpus. We mainly focused on some issues and challenges faced during development of the corpus. Being a less computationally aware language, this is the first effort to develop speech corpus for Assamese. As corpus development is an ongoing process, in this paper we report only the initial task. Keywords-Speech Corpus; Assamese;Transcription

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Part of Speech Tagger for Assamese Text

Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we de...

متن کامل

Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture

Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognit...

متن کامل

A Structured Approach for Building Assamese Corpus: Insights, Applications and Challenges

To study about various naturally occurring phenomenons on natural language text, a well structured text corpus is very much essential. The quality and structure of a corpus can directly influence on performance of various Natural Language Processing applications. Assamese is one of the major Indian languages used by the people of north east India. Language technology development works in Assame...

متن کامل

A Study on Detection of Intonation Events of Assamese Speech Required for Tilt Model

This paper has done a study and experimental analysis on different intonation events of Assamese speech. Assamese is a North East Indian language and spoken by lacks of people in India. The researchers need intonation model to identify language specific intonation events, which are necessary for synthesis process of that particular language. The paper shows outcomes of some experiments done wit...

متن کامل

Development of Speech corpora for different Speech Recognition tasks in Malayalam language

Speech corpus is the backbone of an Automatic speech Recognition system. This paper presents the development of speech corpora for different speech recognition tasks in Malayalam language. Pronunciation dictionary and Transcription file which are the other two essential resources for building a speech recognizer are also being created. Speech recognition performance of different speech recognit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1309.7312  شماره 

صفحات  -

تاریخ انتشار 2013